Initial processing

This document describes the analysis of TCR repertoires for the manuscript “Unique roles of coreceptor-bound LCK in helper and cytotoxic T cells” by Horkova et al. The analysis workflow included:

  • sorting of CD4 and CD8 cells from lymph nodes and thymi of LckWT, LckCA and LckCA/KR mice,
  • isolation of RNA using RNA Clean & Concentrator-5 kit
  • preparation of TCR-enriched libraries using the NEBNext® Mouse Immune Sequencing Kit
  • sequencing the libraries on Illumina Miseq, 300bp paired-end reads
  • demultiplexing reads in Illumina BaseSpace
  • processing fastq files using the recommended pRESTO workflow on Galaxy with default parameters
  • aligning processed fastq files using MiXCR

Raw data are deposited in the Sequence Read Archive (PRJNA872031). Processed files can be downloaded from Zenodo. The Zenodo archive contains the following files, which are needed to run this script:

  • merged outputs from MiXCR merged_TCR_repertoires.csv
  • metadata file metadata_Lck.csv
  • count tables with raw counts of CDR3 TRA and TRB sequences in each sample count_table_tra.csv, count_table_trb.csv
  • TRA and TRB repertoires prepared for processing with the Immunarch package immdata_tra.rds, immdata_trb.rds

Let’s read the required files:

#
merged_repertoires <- read.csv("merged_TCR_repertoires.csv")
md <- read.csv("metadata_Lck.csv")

Count tables

We will start with processing of the file merged_TCR_repertoires.csv. This file contains outputs of MiXCR software, i.e., all the assembled CDR3 clonotypes, their counts in each sample, nucleotide and amino-acid sequences, all V, D and J alignments with scores, together with some meta-data. the column num_id contains info about the sample number, which we will frequently use to attach meta-data to sample numbers.

Here, we summarize the variables in the file merged_TCR_repertoires.csv:

kable(ExpData(data=merged_repertoires, type=2)) %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values
1 cloneId integer 1001983 0 0.000 68138
2 cloneCount integer 1001983 0 0.000 120
3 cloneFraction numeric 1001983 0 0.000 994
4 targetSequences character 1001983 0 0.000 823904
5 targetQualities character 1001983 0 0.000 746125
6 allVHitsWithScore character 1001983 0 0.000 93452
7 allDHitsWithScore character 566670 435313 0.434 327
8 allJHitsWithScore character 1001983 0 0.000 6929
9 allCHitsWithScore character 955 1001028 0.999 122
10 allVAlignments character 1001983 0 0.000 14932
11 allDAlignments character 566670 435313 0.434 13102
12 allJAlignments character 1001983 0 0.000 17988
13 allCAlignments character 144 1001839 1.000 1
14 nSeqCDR3 character 1001983 0 0.000 823904
15 minQualCDR3 integer 1001983 0 0.000 70
16 nSeqFR4 logical 0 1001983 1.000 0
17 aaSeqCDR3 character 1001983 0 0.000 541554
18 num_id integer 1001983 0 0.000 28
19 chain character 1001983 0 0.000 2
20 Mouse_strain character 1001983 0 0.000 3
21 Cell_type character 1001983 0 0.000 2
22 Organ character 1001983 0 0.000 2
23 Cell_count numeric 1001983 0 0.000 28
24 Exp integer 1001983 0 0.000 3
25 INDEX_i7 integer 1001983 0 0.000 12
26 INDEX_i5 integer 1001983 0 0.000 3

First, we will generate summary count tables for TRA and TRB CDR3 sequences (these tables are provided as Supplementary table S1 in the manuscript). We will reorder the file so that there will be one column for each sample, and we will change the num_id of the samples to meaningful names.

count_table_tra <- merged_repertoires %>% 
  dplyr::filter(chain == "TRA") %>% 
  dplyr::select(aaSeqCDR3, num_id, cloneCount)

counts_tra <- count_table_tra %>% 
  pivot_wider(names_from = num_id, values_from = cloneCount, values_fill = 0, values_fn = sum)

md2_tra <- merged_repertoires %>% filter(chain == "TRA") %>% group_by(aaSeqCDR3) %>% slice_head(n = 1)

excel_count_table_tra <- counts_tra %>% left_join(md2_tra %>% select(allVHitsWithScore, allDHitsWithScore, allJHitsWithScore))

md_to_join <- md %>% mutate(new_name = paste(Cell_type, Organ, Mouse_strain, paste0("Exp0",Exp)))  %>% select(num_id, new_name) %>% arrange(num_id)

order <- match(as.numeric(colnames(counts_tra)[2:29]), md_to_join$num_id)

colnames(excel_count_table_tra)[2:29] <- pull(md_to_join, new_name)[order]
excel_count_table_tra2 <- excel_count_table_tra %>% dplyr::select(1, 31, 32, 30, 22, 12, 25, 11, 6, 26, 13, 18, 4, 28, 17, 14, 8, 23, 9, 7, 16, 24, 2, 21, 5, 29, 10, 20, 19, 3, 15, 27)

Here, we show the resulting table for TRA:

kable(ExpData(data=excel_count_table_tra2, type=2)) %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values
1 aaSeqCDR3 character 114760 0 0.000 114760
2 allDHitsWithScore character 4769 109991 0.958 110
3 allJHitsWithScore character 114760 0 0.000 2980
4 allVHitsWithScore character 114760 0 0.000 31200
5 CD4 Lymph nodes CA Exp01 integer 114760 0 0.000 28
6 CD4 Lymph nodes CA Exp02 integer 114760 0 0.000 22
7 CD4 Lymph nodes CA Exp03 integer 114760 0 0.000 37
8 CD4 Lymph nodes CAKR Exp02 integer 114760 0 0.000 26
9 CD4 Lymph nodes CAKR Exp03 integer 114760 0 0.000 35
10 CD4 Lymph nodes WT Exp02 integer 114760 0 0.000 29
11 CD4 Lymph nodes WT Exp03 integer 114760 0 0.000 35
12 CD4 Thymus CA Exp01 integer 114760 0 0.000 15
13 CD4 Thymus CA Exp03 integer 114760 0 0.000 32
14 CD4 Thymus CAKR Exp01 integer 114760 0 0.000 14
15 CD4 Thymus CAKR Exp02 integer 114760 0 0.000 14
16 CD4 Thymus CAKR Exp03 integer 114760 0 0.000 21
17 CD4 Thymus WT Exp02 integer 114760 0 0.000 31
18 CD4 Thymus WT Exp03 integer 114760 0 0.000 27
19 CD8 Lymph nodes CA Exp02 integer 114760 0 0.000 21
20 CD8 Lymph nodes CA Exp03 integer 114760 0 0.000 23
21 CD8 Lymph nodes CAKR Exp02 integer 114760 0 0.000 17
22 CD8 Lymph nodes CAKR Exp03 integer 114760 0 0.000 25
23 CD8 Lymph nodes WT Exp01 integer 114760 0 0.000 26
24 CD8 Lymph nodes WT Exp02 integer 114760 0 0.000 31
25 CD8 Lymph nodes WT Exp03 integer 114760 0 0.000 30
26 CD8 Thymus CA Exp01 integer 114760 0 0.000 9
27 CD8 Thymus CA Exp03 integer 114760 0 0.000 23
28 CD8 Thymus CAKR Exp02 integer 114760 0 0.000 10
29 CD8 Thymus CAKR Exp03 integer 114760 0 0.000 11
30 CD8 Thymus WT Exp01 integer 114760 0 0.000 22
31 CD8 Thymus WT Exp02 integer 114760 0 0.000 9
32 CD8 Thymus WT Exp03 integer 114760 0 0.000 30
count_table_trb <- merged_repertoires %>% 
  dplyr::filter(chain == "TRB") %>% 
  dplyr::select(aaSeqCDR3, num_id, cloneCount)

counts_trb <- count_table_trb %>% 
  pivot_wider(names_from = num_id, values_from = cloneCount, values_fill = 0, values_fn = sum)

md2_trb <- merged_repertoires %>% 
  filter(chain == "TRB") %>% 
  group_by(aaSeqCDR3) %>% 
  slice_head(n = 1)

excel_count_table_trb <- counts_trb %>% 
  left_join(md2_trb %>% 
              select(allVHitsWithScore, allDHitsWithScore, allJHitsWithScore))

md_to_join <- md %>% 
  mutate(new_name = paste(Cell_type, Organ, Mouse_strain, paste0("Exp0",Exp))) %>% 
  select(num_id, new_name) %>% 
  arrange(num_id)

order <- match(as.numeric(colnames(counts_trb)[2:29]), md_to_join$num_id)

colnames(excel_count_table_trb)[2:29] <- pull(md_to_join, new_name)[order]

excel_count_table_trb2 <- excel_count_table_trb %>% 
  dplyr::select(1, 30:32, 22, 12, 25, 11, 6, 26, 13, 18, 4, 28, 17, 14, 8, 23, 9, 7, 16, 24, 2, 21, 5, 29, 10, 20, 19, 3, 15, 27)

Here, we show the resulting table for TRB:

kable(ExpData(data=excel_count_table_trb2, type=2)) %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values
1 aaSeqCDR3 character 426800 0 0.000 426800
2 allVHitsWithScore character 426800 0 0.000 23763
3 allDHitsWithScore character 331672 95128 0.223 214
4 allJHitsWithScore character 426800 0 0.000 2168
5 CD4 Lymph nodes CA Exp01 integer 426800 0 0.000 29
6 CD4 Lymph nodes CA Exp02 integer 426800 0 0.000 21
7 CD4 Lymph nodes CA Exp03 integer 426800 0 0.000 33
8 CD4 Lymph nodes CAKR Exp02 integer 426800 0 0.000 20
9 CD4 Lymph nodes CAKR Exp03 integer 426800 0 0.000 38
10 CD4 Lymph nodes WT Exp02 integer 426800 0 0.000 30
11 CD4 Lymph nodes WT Exp03 integer 426800 0 0.000 25
12 CD4 Thymus CA Exp01 integer 426800 0 0.000 24
13 CD4 Thymus CA Exp03 integer 426800 0 0.000 42
14 CD4 Thymus CAKR Exp01 integer 426800 0 0.000 19
15 CD4 Thymus CAKR Exp02 integer 426800 0 0.000 14
16 CD4 Thymus CAKR Exp03 integer 426800 0 0.000 39
17 CD4 Thymus WT Exp02 integer 426800 0 0.000 17
18 CD4 Thymus WT Exp03 integer 426800 0 0.000 15
19 CD8 Lymph nodes CA Exp02 integer 426800 0 0.000 24
20 CD8 Lymph nodes CA Exp03 integer 426800 0 0.000 25
21 CD8 Lymph nodes CAKR Exp02 integer 426800 0 0.000 24
22 CD8 Lymph nodes CAKR Exp03 integer 426800 0 0.000 32
23 CD8 Lymph nodes WT Exp01 integer 426800 0 0.000 22
24 CD8 Lymph nodes WT Exp02 integer 426800 0 0.000 32
25 CD8 Lymph nodes WT Exp03 integer 426800 0 0.000 23
26 CD8 Thymus CA Exp01 integer 426800 0 0.000 9
27 CD8 Thymus CA Exp03 integer 426800 0 0.000 20
28 CD8 Thymus CAKR Exp02 integer 426800 0 0.000 14
29 CD8 Thymus CAKR Exp03 integer 426800 0 0.000 14
30 CD8 Thymus WT Exp01 integer 426800 0 0.000 16
31 CD8 Thymus WT Exp02 integer 426800 0 0.000 9
32 CD8 Thymus WT Exp03 integer 426800 0 0.000 20

In addition to count matrices, we will also need normalized frequency tables, in which we will show the percentage of the repertoire occupied by each CDR3 sequence in a particular sample. We will first just normalize the count matrices we created earlier.

For TRA, we will also remove canonical NKT-cell CDR3 sequences TRAV11-TRAJ18 before normalization.

Here comes the normalized table for TRA:

excel_count_table_tra3 <- excel_count_table_tra2 %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") & 
       (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no"))

excel_count_table_tra4 <- excel_count_table_tra3 %>% 
  filter(nkt_trav11_traj18 == "no")

count_table_tra4 <- as.matrix(excel_count_table_tra4[,5:32])
rownames(count_table_tra4) <- excel_count_table_tra4$aaSeqCDR3

tra4_norm <- scale(count_table_tra4, center=FALSE, scale=colSums(count_table_tra4))

prop.table.tra <- cbind(tra4_norm, 
                        excel_count_table_tra4 %>% 
                          select(-starts_with("CD")) ) 

prop.table.tra2 <- prop.table.tra %>% 
  dplyr::select(29, 31, 30, 32, 1:28)

data.table(prop.table.tra2 %>% head) 
##         aaSeqCDR3 allJHitsWithScore allDHitsWithScore
## 1: CAASASSGSWQLIF  TRAJ22*00(303.6)              <NA>
## 2:   CAASNMGYKLTF   TRAJ9*00(284.4)              <NA>
## 3: CAVSASSGSWQLIF  TRAJ22*00(304.2)              <NA>
## 4:  CAASDNYAQGLTF    TRAJ26*00(284)              <NA>
## 5: CALSDRYNQGKLIF  TRAJ23*00(272.3)              <NA>
## 6: CAASDDTNAYKVIF  TRAJ30*00(295.7)              <NA>
##                                                                                       allVHitsWithScore
## 1: TRAV14N-2*00(1026.1),TRAV14D-2*00(1017.8),TRAV14D-1*00(973.4),TRAV14N-1*00(973.4),TRAV14-2*00(944.4)
## 2:                                                                 TRAV7-2*00(731.5),TRAV7D-2*00(731.5)
## 3:                                              TRAV3D-3*00(975.2),TRAV3N-3*00(975.2),TRAV3-3*00(944.1)
## 4:                      TRAV14D-1*00(878.2),TRAV14N-1*00(878.2),TRAV14D-2*00(848.2),TRAV14N-2*00(848.2)
## 5:                                                            TRAV12D-3*00(1377.3),TRAV12N-3*00(1377.3)
## 6:                                                                                   TRAV14-1*00(966.7)
##    CD4 Lymph nodes CA Exp01 CD4 Lymph nodes CA Exp02 CD4 Lymph nodes CA Exp03
## 1:             2.727925e-03             0.0035273369             3.841094e-03
## 2:             1.148600e-03             0.0016166961             1.104972e-03
## 3:             7.178751e-04             0.0007348618             3.157064e-04
## 4:             2.871500e-04             0.0000000000             3.157064e-04
## 5:             7.178751e-05             0.0001469724             4.209419e-04
## 6:             0.000000e+00             0.0000000000             5.261773e-05
##    CD4 Lymph nodes CAKR Exp02 CD4 Lymph nodes CAKR Exp03
## 1:               0.0027958993               2.668321e-03
## 2:               0.0013979497               8.687558e-04
## 3:               0.0004659832               5.584859e-04
## 4:               0.0007455732               1.861620e-04
## 5:               0.0000000000               6.205399e-05
## 6:               0.0003727866               1.241080e-04
##    CD4 Lymph nodes WT Exp02 CD4 Lymph nodes WT Exp03 CD4 Thymus CA Exp01
## 1:             0.0035501049             3.291020e-03        0.0028328612
## 2:             0.0008068420             1.097007e-03        0.0012876642
## 3:             0.0002420526             2.611921e-04        0.0005150657
## 4:             0.0001613684             1.567152e-04        0.0000000000
## 5:             0.0000806842             5.223842e-05        0.0000000000
## 6:             0.0000000000             1.567152e-04        0.0002575328
##    CD4 Thymus CA Exp03 CD4 Thymus CAKR Exp01 CD4 Thymus CAKR Exp02
## 1:        0.0022079929          0.0013199578          0.0037400655
## 2:        0.0013247958          0.0013199578          0.0000000000
## 3:        0.0003679988          0.0000000000          0.0023375409
## 4:        0.0003679988          0.0005279831          0.0000000000
## 5:        0.0000000000          0.0000000000          0.0000000000
## 6:        0.0000000000          0.0000000000          0.0004675082
##    CD4 Thymus CAKR Exp03 CD4 Thymus WT Exp02 CD4 Thymus WT Exp03
## 1:          0.0033652602        0.0025633010        2.775089e-03
## 2:          0.0001294331        0.0013754298        5.808325e-04
## 3:          0.0005177323        0.0005001563        3.872217e-04
## 4:          0.0000000000        0.0003125977        3.872217e-04
## 5:          0.0001294331        0.0000000000        6.453695e-05
## 6:          0.0000000000        0.0003751172        1.936108e-04
##    CD8 Lymph nodes CA Exp02 CD8 Lymph nodes CA Exp03 CD8 Lymph nodes CAKR Exp02
## 1:             0.0099369387             0.0093881515               0.0080917874
## 2:             0.0018154023             0.0020233085               0.0019323671
## 3:             0.0021975922             0.0018614438               0.0008454106
## 4:             0.0009554749             0.0009711881               0.0007246377
## 5:             0.0011465698             0.0008902557               0.0016908213
## 6:             0.0012421173             0.0008093234               0.0006038647
##    CD8 Lymph nodes CAKR Exp03 CD8 Lymph nodes WT Exp01 CD8 Lymph nodes WT Exp02
## 1:               0.0096842588              0.009602698              0.011932808
## 2:               0.0022348289              0.002492303              0.001946123
## 3:               0.0020056157              0.001905879              0.001894909
## 4:               0.0009741562              0.001612667              0.001177917
## 5:               0.0015471893              0.003665152              0.003021612
## 6:               0.0004011231              0.001319455              0.000870634
##    CD8 Lymph nodes WT Exp03 CD8 Thymus CA Exp01 CD8 Thymus CA Exp03
## 1:             0.0097454014        0.0062111801        0.0073130455
## 2:             0.0021318066        0.0014614541        0.0025163167
## 3:             0.0019490803        0.0014614541        0.0008649839
## 4:             0.0012790839        0.0010960906        0.0004718094
## 5:             0.0024363503        0.0003653635        0.0005504443
## 6:             0.0007918139        0.0007307271        0.0009436188
##    CD8 Thymus CAKR Exp02 CD8 Thymus CAKR Exp03 CD8 Thymus WT Exp01
## 1:          0.0084643289          0.0065963061        0.0088170463
## 2:          0.0008061266          0.0015077271        0.0020992967
## 3:          0.0016122531          0.0009423294        0.0019943319
## 4:          0.0012091898          0.0011307953        0.0015744726
## 5:          0.0004030633          0.0005653977        0.0016794374
## 6:          0.0016122531          0.0016961930        0.0007347539
##    CD8 Thymus WT Exp02 CD8 Thymus WT Exp03
## 1:        0.0053846154         0.007650815
## 2:        0.0015384615         0.001761339
## 3:        0.0015384615         0.001541171
## 4:        0.0023076923         0.001431088
## 5:        0.0019230769         0.001155878
## 6:        0.0003846154         0.001265962

Here comes the normalized table for TRA:

excel_count_table_trb3 <- excel_count_table_trb2 %>% 
  mutate(nkt_trav11_traj18 = "no")

count_table_trb4 <- as.matrix(excel_count_table_trb3[,5:32])
rownames(count_table_trb4) <- excel_count_table_trb3$aaSeqCDR3


trb4_norm <- scale(count_table_trb4, center=FALSE, scale=colSums(count_table_trb4))

prop.table.trb <- cbind(trb4_norm, excel_count_table_trb %>% select(-starts_with("CD")) ) 
colnames(prop.table.trb)
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2 <- prop.table.trb %>% dplyr::select(29, 31, 30, 32, 1:28)


data.table(prop.table.trb2 %>% head)
##         aaSeqCDR3 allDHitsWithScore                     allVHitsWithScore
## 1:   CASSDSAETLYF              <NA> TRBV13-3*00(755.1),TRBV13-1*00(664.5)
## 2:     CASSDAEQFF              <NA>                    TRBV13-1*00(819.6)
## 3:   CASSDAGYEQYF              <NA>                   TRBV13-1*00(1283.8)
## 4: CASSDWGGYAEQFF      TRBD2*00(45)                      TRBV13-1*00(676)
## 5: CASSDSGGQDTQYF      TRBD2*00(35)                    TRBV13-3*00(886.3)
## 6:   CASRDRNTEVFF      TRBD1*00(30)                      TRBV13-3*00(979)
##    allJHitsWithScore CD4 Lymph nodes CA Exp01 CD4 Lymph nodes CA Exp02
## 1:   TRBJ2-3*00(245)             1.729847e-04             5.154108e-05
## 2: TRBJ2-1*00(219.2)             0.000000e+00             0.000000e+00
## 3:   TRBJ2-7*00(215)             0.000000e+00             5.154108e-05
## 4:   TRBJ2-1*00(235)             7.413631e-05             5.154108e-05
## 5:   TRBJ2-5*00(235)             2.471210e-05             0.000000e+00
## 6: TRBJ1-1*00(232.7)             1.482726e-04             5.154108e-05
##    CD4 Lymph nodes CA Exp03 CD4 Lymph nodes CAKR Exp02
## 1:             1.141010e-04               3.152287e-05
## 2:             9.508415e-05               0.000000e+00
## 3:             3.803366e-05               0.000000e+00
## 4:             1.901683e-05               1.260915e-04
## 5:             3.803366e-05               3.152287e-05
## 6:             5.705049e-05               9.456861e-05
##    CD4 Lymph nodes CAKR Exp03 CD4 Lymph nodes WT Exp02 CD4 Lymph nodes WT Exp03
## 1:               2.747190e-04             2.637618e-05             2.293368e-04
## 2:               2.289325e-05             5.275235e-05             2.293368e-05
## 3:               6.867975e-05             2.637618e-04             2.293368e-05
## 4:               0.000000e+00             0.000000e+00             0.000000e+00
## 5:               0.000000e+00             0.000000e+00             0.000000e+00
## 6:               0.000000e+00             0.000000e+00             6.880103e-05
##    CD4 Thymus CA Exp01 CD4 Thymus CA Exp03 CD4 Thymus CAKR Exp01
## 1:        2.366397e-04        6.682259e-05          8.731718e-05
## 2:        7.887991e-05        4.454839e-05          0.000000e+00
## 3:        3.549596e-04        1.113710e-04          4.365859e-05
## 4:        1.577598e-04        2.227420e-05          0.000000e+00
## 5:        0.000000e+00        0.000000e+00          0.000000e+00
## 6:        0.000000e+00        4.454839e-05          0.000000e+00
##    CD4 Thymus CAKR Exp02 CD4 Thymus CAKR Exp03 CD4 Thymus WT Exp02
## 1:                     0          0.0000000000        4.972527e-05
## 2:                     0          0.0000000000        2.486263e-05
## 3:                     0          0.0000300075        2.486263e-05
## 4:                     0          0.0000000000        2.486263e-05
## 5:                     0          0.0000000000        2.486263e-05
## 6:                     0          0.0002100525        0.000000e+00
##    CD4 Thymus WT Exp03 CD8 Lymph nodes CA Exp02 CD8 Lymph nodes CA Exp03
## 1:        2.538554e-04             3.846495e-04             3.699984e-04
## 2:        3.173193e-05             2.662958e-04             1.849992e-04
## 3:        0.000000e+00             3.254727e-04             2.642846e-04
## 4:        3.173193e-05             1.183537e-04             2.642846e-04
## 5:        0.000000e+00             5.917685e-05             5.285692e-05
## 6:        3.173193e-05             2.367074e-04             5.285692e-05
##    CD8 Lymph nodes CAKR Exp02 CD8 Lymph nodes CAKR Exp03
## 1:               3.195909e-04               2.544113e-04
## 2:               1.452686e-04               7.268895e-05
## 3:               3.776984e-04               2.544113e-04
## 4:               9.878265e-04               4.179614e-04
## 5:               2.905372e-05               7.268895e-05
## 6:               1.452686e-04               1.272057e-04
##    CD8 Lymph nodes WT Exp01 CD8 Lymph nodes WT Exp02 CD8 Lymph nodes WT Exp03
## 1:             0.0005254621             3.488707e-04             0.0002286917
## 2:             0.0002741541             1.365146e-04             0.0001829533
## 3:             0.0005483082             6.219000e-04             0.0005031216
## 4:             0.0003198465             2.730293e-04             0.0002058225
## 5:             0.0003198465             4.550488e-05             0.0000000000
## 6:             0.0002056156             4.550488e-05             0.0001372150
##    CD8 Thymus CA Exp01 CD8 Thymus CA Exp03 CD8 Thymus CAKR Exp02
## 1:        0.0003316475        1.856493e-04          0.0000000000
## 2:        0.0002487356        9.282465e-05          0.0002597628
## 3:        0.0003316475        1.856493e-04          0.0000000000
## 4:        0.0001658237        3.094155e-05          0.0003463503
## 5:        0.0000000000        3.094155e-05          0.0000000000
## 6:        0.0000000000        1.237662e-04          0.0000000000
##    CD8 Thymus CAKR Exp03 CD8 Thymus WT Exp01 CD8 Thymus WT Exp02
## 1:          0.0000000000        1.804457e-04        4.479484e-04
## 2:          0.0002214839        3.308171e-04        2.687690e-04
## 3:          0.0001661130        4.210400e-04        3.583587e-04
## 4:          0.0002214839        2.706686e-04        2.687690e-04
## 5:          0.0001107420        6.014857e-05        0.000000e+00
## 6:          0.0000000000        1.202971e-04        8.958968e-05
##    CD8 Thymus WT Exp03
## 1:        2.347268e-04
## 2:        8.535518e-05
## 3:        2.774043e-04
## 4:        6.401639e-05
## 5:        2.133880e-05
## 6:        1.066940e-04

Gene segment usage analysis

To compare the gene segments used by T-cells from different mouse strains, we will use the R package Immunarch. First, we loaded the MiXCR outputs using the repLoad function and saved it as immdata_tra and immdata_trb, respectively to facilitate the analysis. We will load the prepared datasets:

immdata_tra <- readRDS("immdata_tra.rds")
immdata_trb <- readRDS("immdata_trb.rds")

Using the gene usage visualization from the Immunarch package, we will plot the gene segments of TRA chains and TRB chains for each cell type and each organ.

TRA CD4 Thymus

### Thymus

cd4_thymus <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd4_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA Thymus CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd4_thymus, "source_data/FigED7_cd4_thymus_tra.csv")


#ggsave("./final_fig/gene_usage_new/cd4_thymus_tra_just1segment.png", width = 60, height = 12, units = "cm") 
#ggsave("./final_fig/gene_usage_new/cd4_thymus_tra_just1segment.svg", width = 60, height = 12, units = "cm")

TRA CD4 Lymph nodes

### LN
cd4_ln <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd4_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA LN CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd4_ln, "source_data/FigED7_cd4_ln_tra.csv")

#ggsave("./final_fig/gene_usage_new/cd4_ln_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd4_ln_tra_just1segment.svg", width = 60, height = 12, units = "cm")

TRA CD8 Thymus

cd8_thymus <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd8_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA Thymus CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd8_thymus, "source_data/FigED7_cd8_thymus_tra.csv")

#ggsave("./final_fig/gene_usage_new/cd8_thymus_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_thymus_tra_just1segment.svg", width = 60, height = 12, units = "cm")

TRA CD8 Lymph nodes

cd8_ln <- geneUsage(repFilter(immdata_tra, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trav")

vis(cd8_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta) + ggtitle("TRA LN CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd8_ln, "source_data/FigED7_cd8_ln_tra.csv")

#ggsave("./final_fig/gene_usage_new/cd8_ln_tra_just1segment.png", width = 60, height = 12, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_ln_tra_just1segment.svg", width = 60, height = 12, units = "cm")

TRB CD4 Thymus

### Thymus

cd4_thymus <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd4_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB Thymus CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) 

# source data
write.csv(cd4_thymus, "source_data/FigED7_cd4_thymus_trb.csv")

#ggsave("./final_fig/gene_usage_new/cd4_thymus_trb_just1segment.png", width = 20, height = 10, units = "cm") 
#ggsave("./final_fig/gene_usage_new/cd4_thymus_trb_just1segment.svg", width = 20, height = 10, units = "cm")

TRB CD4 Lymph nodes

### LN
cd4_ln <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD4')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd4_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB LN CD4") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd4_ln, "source_data/FigED7_cd4_ln_trb.csv")

#ggsave("./final_fig/gene_usage_new/cd4_ln_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd4_ln_trb_just1segment.svg", width = 20, height = 10, units = "cm")

TRB CD8 Thymus

cd8_thymus <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = include('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd8_thymus, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB Thymus CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd8_thymus, "source_data/FigED7_cd8_thymus_trb.csv")

#ggsave("./final_fig/gene_usage_new/cd8_thymus_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_thymus_trb_just1segment.svg", width = 20, height = 10, units = "cm")

TRB CD8 Lymph nodes

cd8_ln <- geneUsage(repFilter(immdata_trb, 
                                 .method = "by.meta", 
                                 .query = list(Organ = exclude('Thymus'), Cell_type = include('CD8')))$data, .norm = T, .ambig = "maj", .quant = "count", .gene = "musmus.trbv")

vis(cd8_ln, .by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F) + ggtitle("TRB LN CD8") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70"))

# source data
write.csv(cd8_ln, "source_data/FigED7_cd8_ln_trb.csv")

#ggsave("./final_fig/gene_usage_new/cd8_ln_trb_just1segment.png", width = 20, height = 10, units = "cm")
#ggsave("./final_fig/gene_usage_new/cd8_ln_trb_just1segment.svg", width = 20, height = 10, units = "cm")

Analysis of NKT cells

Next, we will identify gene segments that are typical for invariant NKT cells. These segments are TRAV11 and TRAJ18 for TRA and TRBV1, TRBV13-2 and TRBV29 for TRB. Please, note that some of the included TRB chains are used by conventional cells as well.

# NKT analysis
merged_repertoires <- merged_repertoires %>% 
  mutate(is_nkt = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11")) & (grepl(allJHitsWithScore, pattern = "TRAJ18")) |
    (grepl(allVHitsWithScore, pattern = "TRBV13-2")) |
      (grepl(allVHitsWithScore, pattern = "TRBV1\\*")) |
      (grepl(allVHitsWithScore, pattern = "TRBV29"))  ,"yes","no")) %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") & (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% mutate(nkt_trbv13_2 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV13-2")),"yes","no")) %>% mutate(nkt_trbv29 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV29")),"yes","no")) %>% mutate(nkt_trbv1 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRBV1\\*")),"yes","no"))

md <- md %>% select(Exp, Organ, Cell_type, Mouse_strain, num_id) %>% mutate_at(vars("num_id"), as.numeric)

# All TRB NKT
nkt_table_trb <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, is_nkt, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, is_nkt) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(is_nkt == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRAV11 TRAJ18
nkt_table_trav11_traj18 <- merged_repertoires %>% 
   filter(chain == "TRA") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trav11_traj18, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trav11_traj18) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trav11_traj18 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRBV1
nkt_table_trbv1 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv1, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv1) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv1 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

# TRBV13-2
nkt_table_trbv13_2 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv13_2, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv13_2) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv13_2 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 


# TRBV29
nkt_table_trbv29 <- merged_repertoires %>% 
   filter(chain == "TRB") %>% 
 mutate(new_name = paste(Exp, Organ, Cell_type, Mouse_strain)) %>% 
  dplyr::select(new_name, cloneCount, nkt_trbv29, num_id) %>% 
  uncount(cloneCount) %>% 
  group_by(num_id, nkt_trbv29) %>% 
    summarise(n = n()) %>% 
mutate(freq = n / sum(n)) %>% dplyr::filter(nkt_trbv29 == "yes") %>% 
  ungroup %>% 
  left_join(md) %>% 
  unique %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")
levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")

nkt_table_trb$nkt_sample <- "all_nkt_trb"
nkt_table_trav11_traj18$nkt_sample <- "trav11_traj18"
nkt_table_trbv1$nkt_sample <- "trbv1"
nkt_table_trbv13_2$nkt_sample <- "trbv13_2"
nkt_table_trbv29$nkt_sample <- "trbv29"

nkt_table <- cbind(nkt_table_trb$num_id, nkt_table_trb$sample_type, nkt_table_trb$Exp, format(nkt_table_trb$freq*100, digits = 2), 
                   format(nkt_table_trav11_traj18$freq*100, digits = 2),
format(nkt_table_trbv1$freq*100, digits = 2), 
format(nkt_table_trbv13_2$freq*100, digits = 2),
                   format(nkt_table_trbv29$freq*100, digits = 2)) %>% as.data.frame

colnames(nkt_table) <- c("num_id","Sample","Exp","All TRB NKT", "TRAV11-TRAJ18", "TRBV1","TRBV13-2","TRBV29")
nkt_table <- nkt_table %>% mutate(Sample = paste(Sample, Exp)) %>% select(-num_id, -Exp) %>% arrange(Sample)

See the percentage of NKT gene segments in each sample:

kable(nkt_table, format = "html") %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Sample All TRB NKT TRAV11-TRAJ18 TRBV1 TRBV13-2 TRBV29
CD4 Lymph nodes CA 1 18 2.574 3.2 12.0 2.7
CD4 Lymph nodes CA 2 19 2.326 3.1 12.4 3.1
CD4 Lymph nodes CA 3 16 2.046 2.4 10.9 2.8
CD4 Lymph nodes CAKR 2 19 1.704 3.1 12.7 2.9
CD4 Lymph nodes CAKR 3 18 2.357 3.4 12.0 3.0
CD4 Lymph nodes WT 2 17 0.474 2.8 11.4 2.9
CD4 Lymph nodes WT 3 17 0.307 3.1 11.1 2.9
CD4 Thymus CA 1 36 25.798 3.2 22.8 10.1
CD4 Thymus CA 3 38 26.069 3.3 21.9 12.4
CD4 Thymus CAKR 1 30 22.058 3.3 20.5 6.5
CD4 Thymus CAKR 2 32 38.268 6.9 18.5 6.8
CD4 Thymus CAKR 3 38 23.262 4.2 24.9 8.4
CD4 Thymus WT 2 23 8.824 3.4 13.6 5.8
CD4 Thymus WT 3 19 6.685 3.8 10.7 4.5
CD8 Lymph nodes CA 2 21 0.076 2.8 8.4 9.9
CD8 Lymph nodes CA 3 20 0.137 2.6 8.5 9.1
CD8 Lymph nodes CAKR 2 24 0.084 3.0 12.2 8.7
CD8 Lymph nodes CAKR 3 20 0.080 2.7 8.9 8.2
CD8 Lymph nodes WT 1 23 0.117 2.8 10.8 9.6
CD8 Lymph nodes WT 2 21 0.051 2.4 9.1 9.8
CD8 Lymph nodes WT 3 21 0.091 2.4 8.9 9.5
CD8 Thymus CA 1 22 0.219 2.9 11.6 7.8
CD8 Thymus CA 3 19 0.102 2.4 8.4 7.8
CD8 Thymus CAKR 2 23 0.521 2.8 12.4 7.6
CD8 Thymus CAKR 3 19 0.188 3.2 10.5 5.6
CD8 Thymus WT 1 23 0.126 2.2 11.1 10.1
CD8 Thymus WT 2 26 0.115 2.7 12.9 10.2
CD8 Thymus WT 3 20 0.126 2.3 8.6 9.0
# source data
write.csv(nkt_table, "source_data/FigED7b_nkt_cells.csv", row.names = F)
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection

Now we will plot the data for CD4 cells and for CD8 cells. Plots for TRA and TRB sequences are separated.

All NKT

This plot visualizes the percentage of any of the TRB NKT chains (TRBV1, TRBV13-2 and TRBV29). We can see that this distinction is not specific as these gene segments are also used by conventional CD4 and CD8 T cells.

## CD4
p01 <- nkt_table_trb %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("NKT segments in CD4 TRB") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p02 <- nkt_table_trb %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("NKT segments in CD8 TRB") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_all_nkt_trb_cd8.eps", width = 8, height = 8, units = "cm")

p01 + p02

TRAV11 TRAJ18

## CD4
p03 <- nkt_table_trav11_traj18 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRAV11-TRAJ18 segments in CD4 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p04 <- nkt_table_trav11_traj18 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRAV11-TRAJ18 segments in CD8 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,1)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trav11_traj18_tra_cd8.eps", width = 8, height = 8, units = "cm")

p03 + p04

TRBV1

## CD4
p05 <- nkt_table_trbv1 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD4 TRBV1") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p06 <- nkt_table_trbv1 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD8 TRBV1") + 
  theme_classic() + 
  ggtheme() + 
  
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv1_cd8.eps", width = 8, height = 8, units = "cm")

p05 + p06

TRBV13-2

## CD4
p07 <- nkt_table_trbv13_2 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRBV13-2 segments in CD4 cells") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p08 <- nkt_table_trbv13_2 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("TRBV13-2 segments in CD8 cells") + 
  theme_classic() + 
  ggtheme() + 
  
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv13_2_cd8.eps", width = 8, height = 8, units = "cm")

p07 + p08

TRBV29

## CD4
p09 <- nkt_table_trbv29 %>% 
  filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD4 TRBV29") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd4.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd4.eps", width = 8, height = 8, units = "cm")

## CD8
p10 <- nkt_table_trbv29 %>% 
  filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = freq*100, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Percentage of all chains") + 
  ggtitle("CD8 TRBV29") + 
  theme_classic() + 
  ggtheme() + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")
  
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd8.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/nkt/pct_nkt_trbv29_cd8.eps", width = 8, height = 8, units = "cm")

p09 + p10

Principal component analysis

In this part we explore the similarity of TCR repertoires of different samples using principal component analysis (PCA).

We will calculate the distances from a filtered matrix of proportions, in which we filtered out the CDR3 sequences that are present in less than five CD4 or in less than five CD8 samples.

Preparing filtered normalized count matrices

First, we will prepare the filtered normalized count matrix.

Below, we find and save the CDR3 TRA/TRB amino acid sequences, which are present in 5 or more samples out of 14 CD4 samples, or in 5 or more samples out of 14 CD8 samples.

# TRA
keep_tra_cd4 <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  select(aaSeqCDR3, starts_with("CD4")) %>% 
  mutate_at(vars(starts_with("CD4")), .funs = binary) 
 
keep_tra_cd4$sum <- rowSums((keep_tra_cd4 %>% select(-aaSeqCDR3)))

keep_tra_cd4$keep <- ifelse(keep_tra_cd4$sum>4,1,0)
keep_tra_cd4$orig <- ifelse(keep_tra_cd4$sum>0,1,0)

keep_tra_cd4_sequences <- pull(keep_tra_cd4 %>% filter(keep == 1), aaSeqCDR3)

# TRB
keep_trb_cd4 <- excel_count_table_trb %>% select(aaSeqCDR3, starts_with("CD4")) %>%  mutate_at(vars(starts_with("CD4")), .funs = binary) 
 
keep_trb_cd4$sum <- rowSums((keep_trb_cd4 %>% select(-aaSeqCDR3)))

keep_trb_cd4$keep <- ifelse(keep_trb_cd4$sum>4,1,0)
keep_trb_cd4$orig <- ifelse(keep_trb_cd4$sum>0,1,0)

keep_trb_cd4_sequences <- pull(keep_trb_cd4 %>% filter(keep == 1), aaSeqCDR3)

# TRA
keep_tra_cd8 <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  select(aaSeqCDR3, starts_with("CD8")) %>% 
  mutate_at(vars(starts_with("CD8")), .funs = binary) 

keep_tra_cd8$sum <- rowSums((keep_tra_cd8 %>% select(-aaSeqCDR3)))

keep_tra_cd8$keep <- ifelse(keep_tra_cd8$sum>4,1,0)
keep_tra_cd8$orig <- ifelse(keep_tra_cd8$sum>0,1,0)

keep_tra_cd8_sequences <- pull(keep_tra_cd8 %>% filter(keep == 1), aaSeqCDR3)

# TRB
keep_trb_cd8 <- excel_count_table_trb %>% select(aaSeqCDR3, starts_with("CD8")) %>%  mutate_at(vars(starts_with("CD8")), .funs = binary) 
 
keep_trb_cd8$sum <- rowSums((keep_trb_cd8 %>% select(-aaSeqCDR3)))

keep_trb_cd8$keep <- ifelse(keep_trb_cd8$sum>4,1,0)
keep_trb_cd8$orig <- ifelse(keep_trb_cd8$sum>0,1,0)

keep_trb_cd8_sequences <- pull(keep_trb_cd8 %>% filter(keep == 1), aaSeqCDR3)


cdr3_count_table <- data.frame(Sample = c("CD4 TRA","CD4 TRB","CD8 TRA","CD8 TRB"),
                               `Count before filtering` = c(
                                 length(pull(keep_tra_cd4 %>% filter(orig == 1), aaSeqCDR3)),
                                 length(pull(keep_trb_cd4 %>% filter(orig == 1), aaSeqCDR3)),
                                 length(pull(keep_tra_cd8 %>% filter(orig == 1), aaSeqCDR3)),
                                 length(pull(keep_trb_cd8 %>% filter(orig == 1), aaSeqCDR3))),
                               `Count after filtering` = c(
                                 length(pull(keep_tra_cd4 %>% filter(keep == 1), aaSeqCDR3)),
                                 length(pull(keep_trb_cd4 %>% filter(keep == 1), aaSeqCDR3)),
                                 length(pull(keep_tra_cd8 %>% filter(keep == 1), aaSeqCDR3)),
                                 length(pull(keep_trb_cd8 %>% filter(keep == 1), aaSeqCDR3))))

Here you can check the counts of CDR3 sequences before and after filtering.

kable(cdr3_count_table) %>%
  kable_styling(full_width = F, font_size = 11, 
                bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Sample Count.before.filtering Count.after.filtering
CD4 TRA 64812 3406
CD4 TRB 245611 7201
CD8 TRA 65540 3227
CD8 TRB 238965 7808

Now we will create filtered tables with the abundant sequences.

# TRB
excel_count_table_trb3_filter <- excel_count_table_trb2  %>% filter(aaSeqCDR3 %in% keep_trb_cd4_sequences | aaSeqCDR3 %in% keep_trb_cd8_sequences) 

count_table_trb4_filter <- as.matrix(excel_count_table_trb3_filter[,5:32])
rownames(count_table_trb4_filter) <- excel_count_table_trb3_filter$aaSeqCDR3

trb4_norm_filter <- scale(count_table_trb4_filter, center=FALSE, scale=colSums(count_table_trb4_filter))

prop.table.trb_filter <- cbind(trb4_norm_filter, excel_count_table_trb3_filter %>% select(-starts_with("CD")) )

# TRA
excel_count_table_tra3_filter <- excel_count_table_tra3  %>% filter(aaSeqCDR3 %in% keep_tra_cd4_sequences | aaSeqCDR3 %in% keep_tra_cd8_sequences) %>% filter(nkt_trav11_traj18 == "no")

count_table_tra4_filter <- as.matrix(excel_count_table_tra3_filter[,5:32])
rownames(count_table_tra4_filter) <- excel_count_table_tra3_filter$aaSeqCDR3

tra4_norm_filter <- scale(count_table_tra4_filter, center=FALSE, scale=colSums(count_table_tra4_filter))

prop.table.tra_filter <- cbind(tra4_norm_filter, excel_count_table_tra3_filter %>% select(-starts_with("CD")) )

colnames(prop.table.tra_filter)
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allDHitsWithScore"         
## [31] "allJHitsWithScore"          "allVHitsWithScore"         
## [33] "nkt_trav11_traj18"

Last, we will select the columns of the filtered table corresponding to CD4 and CD8 cells. We will remove the rows in which there are only zeros for the subset of CD4 or CD8 cells, respectively.

prop.table.tra2_filter.cd4 <- prop.table.tra_filter[,1:14]
prop.table.tra2_filter.cd4$sum <- rowSums(prop.table.tra2_filter.cd4)
prop.table.tra2_filter.cd4 <- prop.table.tra2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

prop.table.trb2_filter.cd4 <- prop.table.trb_filter[,1:14]
prop.table.trb2_filter.cd4$sum <- rowSums(prop.table.trb2_filter.cd4)
prop.table.trb2_filter.cd4 <- prop.table.trb2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

prop.table.tra2_filter.cd8 <- prop.table.tra_filter[,15:28]
prop.table.tra2_filter.cd8$sum <- rowSums(prop.table.tra2_filter.cd8)
prop.table.tra2_filter.cd8 <- prop.table.tra2_filter.cd8 %>% filter(sum >0) %>% select(-sum)

prop.table.trb2_filter.cd8 <- prop.table.trb_filter[,15:28]
prop.table.trb2_filter.cd8$sum <- rowSums(prop.table.trb2_filter.cd8)
prop.table.trb2_filter.cd8 <- prop.table.trb2_filter.cd8 %>% filter(sum >0) %>% select(-sum)

PCA plots

Using the tables that we generated previously, we will compute the PCA with the package factoextra. First, we will merge TRA and TRB CDR3 sequences together and the we will also perform separated analyses.

TRA + TRB CD4

prop.table.tra2_filter.merge <- rbind(prop.table.tra2_filter.cd4, prop.table.trb2_filter.cd4)
  
res.pca.merge.cd4 <- prcomp(t(prop.table.tra2_filter.merge), scale = TRUE, center = T)

mdres.pca.merge.cd4  <-  colnames(prop.table.tra2_filter.merge[,1:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))

fviz_pca_ind(res.pca.merge.cd4,
             col.ind = as.factor(mdres.pca.merge.cd4$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd4$Organ),
                  color = as.factor(mdres.pca.merge.cd4$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  ggtheme() +
  theme(axis.text.x = element_text(angle = 0))

# source data
# write.csv(res.pca.merge.cd4$x, "source_data/pca_raw_cd4.csv")

#ggsave("final_fig/pca/cd4.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4.svg", width = 2.6, height = 1.7)

As the points in the central location are overlapping, we will also create a zoom plot. This will be added to the previous figure manually in a graphic editor.

fviz_pca_ind(res.pca.merge.cd4,
             col.ind = as.factor(mdres.pca.merge.cd4$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd4$Organ),
                  color = as.factor(mdres.pca.merge.cd4$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  xlim(c(-30,0)) + ylim(-2,15) +
  ggtheme() +
  theme(axis.text.x = element_text(angle = 0))

#ggsave("final_fig/pca/cd4_zoom.png", width = 2.2, height = 1.4)
#ggsave("final_fig/pca/cd4_zoom.svg", width = 2.2, height = 1.4)

TRA + TRB CD8

prop.table.tra2_filter.merge <- rbind(prop.table.tra2_filter.cd8, prop.table.trb2_filter.cd8)
  
res.pca.merge.cd8 <- prcomp(t(prop.table.tra2_filter.merge), scale = TRUE, center = T)

mdres.pca.merge.cd8  <-  colnames(prop.table.tra2_filter.merge[,1:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.merge.cd8,
             col.ind = as.factor(mdres.pca.merge.cd8$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd8$Organ),
                  color = as.factor(mdres.pca.merge.cd8$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  ggtheme()

# source data
write.csv(res.pca.merge.cd8$x, "source_data/pca_raw_cd8.csv")

#ggsave("final_fig/pca/cd8.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8.svg", width = 2.6, height = 1.7)

As the points in the central location are overlapping, we will also create a zoom plot. This will be added to the previous figure manually in a graphic editor.

fviz_pca_ind(res.pca.merge.cd8,
             col.ind = as.factor(mdres.pca.merge.cd8$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.merge.cd8$Organ),
                  color = as.factor(mdres.pca.merge.cd8$Strain))) + 
  scale_shape_manual(values = c(8,8,8,15,15)) +
  xlim(c(-32,-18)) + ylim(-4,5) +
  ggtheme()

#ggsave("final_fig/pca/cd8_zoom.png", width = 2.2, height = 1.4)
#ggsave("final_fig/pca/cd8_zoom.svg", width = 2.2, height = 1.4)

Next, we will create eight separate plots which show the PCA for TRA/TRB CDR3 sequences from each cell type and each organ.

TRA CD4 Thymus

## Thymus
prop.table.tra2_filter.cd4.thy <- prop.table.tra_filter[,8:14]
prop.table.tra2_filter.cd4.thy$sum <- rowSums(prop.table.tra2_filter.cd4.thy)

prop.table.tra2_filter.cd4.thy <- prop.table.tra2_filter.cd4.thy %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd4.thy <- prcomp(t(prop.table.tra2_filter.cd4.thy), scale = TRUE, center = T)

mdres.pca.tra.cd4.thy  <-  colnames(prop.table.tra_filter[,8:14])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.tra.cd4.thy,
             col.ind = as.factor(mdres.pca.tra.cd4.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd4.thy$Organ),
                  color = as.factor(mdres.pca.tra.cd4.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") +
  ggtheme()

#ggsave("final_fig/pca/cd4_tra_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_tra_thy.svg", width = 2.6, height = 1.7)

TRA CD4 Lymph nodes

## LN
prop.table.tra2_filter.cd4.ln <- prop.table.tra_filter[,1:7]
prop.table.tra2_filter.cd4.ln$sum <- rowSums(prop.table.tra2_filter.cd4.ln)

prop.table.tra2_filter.cd4.ln <- prop.table.tra2_filter.cd4.ln %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd4.ln <- prcomp(t(prop.table.tra2_filter.cd4.ln), scale = TRUE, center = T)

mdres.pca.tra.cd4.ln  <-  colnames(prop.table.tra_filter[,1:7])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.tra.cd4.ln,
             col.ind = as.factor(mdres.pca.tra.cd4.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd4.ln$Organ),
                  color = as.factor(mdres.pca.tra.cd4.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_tra_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_tra_ln.svg", width = 2.6, height = 1.7)

TRA CD8 Thymus

## Thymus
prop.table.tra2_filter %>% colnames
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
prop.table.tra2_filter.cd8.thy <- prop.table.tra_filter[,22:28]
prop.table.tra2_filter.cd8.thy$sum <- rowSums(prop.table.tra2_filter.cd8.thy)

prop.table.tra2_filter.cd8.thy <- prop.table.tra2_filter.cd8.thy %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd8.thy <- prcomp(t(prop.table.tra2_filter.cd8.thy), scale = TRUE, center = T)

mdres.pca.tra.cd8.thy  <-  colnames(prop.table.tra_filter[,22:28])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.tra.cd8.thy,
             col.ind = as.factor(mdres.pca.tra.cd8.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd8.thy$Organ),
                  color = as.factor(mdres.pca.tra.cd8.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_tra_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_tra_thy.svg", width = 2.6, height = 1.7)

TRA CD8 Lymph nodes

## LN
prop.table.tra2_filter %>% colnames
## Error in is.data.frame(x): object 'prop.table.tra2_filter' not found
prop.table.tra2_filter.cd8.ln <- prop.table.tra_filter[,15:21]
prop.table.tra2_filter.cd8.ln$sum <- rowSums(prop.table.tra2_filter.cd8.ln)

prop.table.tra2_filter.cd8.ln <- prop.table.tra2_filter.cd8.ln %>% filter(sum >0) %>% select(-sum)

res.pca.tra.cd8.ln <- prcomp(t(prop.table.tra2_filter.cd8.ln), scale = TRUE, center = T)

mdres.pca.tra.cd8.ln  <-  colnames(prop.table.tra_filter[,15:21])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.tra.cd8.ln,
             col.ind = as.factor(mdres.pca.tra.cd8.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.tra.cd8.ln$Organ),
                  color = as.factor(mdres.pca.tra.cd8.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_tra_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_tra_ln.svg", width = 2.6, height = 1.7)

TRB CD4 Thymus

## Thymus
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd4.thy <- prop.table.trb_filter[,12:18]
prop.table.trb2_filter.cd4.thy$sum <- rowSums(prop.table.trb2_filter.cd4.thy)

prop.table.trb2_filter.cd4.thy <- prop.table.trb2_filter.cd4.thy %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4.thy <- prcomp(t(prop.table.trb2_filter.cd4.thy), scale = TRUE, center = T)

mdres.pca.trb.cd4.thy  <-  colnames(prop.table.trb_filter[,12:18])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.trb.cd4.thy,
             col.ind = as.factor(mdres.pca.trb.cd4.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd4.thy$Organ),
                  color = as.factor(mdres.pca.trb.cd4.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_trb_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_trb_thy.svg", width = 2.6, height = 1.7)

TRB CD4 Lymph nodes

## All
colnames(prop.table.trb2_filter)
## Error in is.data.frame(x): object 'prop.table.trb2_filter' not found
prop.table.trb2_filter.cd4 <- prop.table.trb_filter[,5:18]
prop.table.trb2_filter.cd4$sum <- rowSums(prop.table.trb2_filter.cd4)

prop.table.trb2_filter.cd4 <- prop.table.trb2_filter.cd4 %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4 <- prcomp(t(prop.table.trb2_filter.cd4), scale = TRUE, center = T)

mdres.pca.trb.cd4  <-  colnames(prop.table.trb_filter[,5:18])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 

## LN
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd4.ln <- prop.table.trb_filter[,5:11]
prop.table.trb2_filter.cd4.ln$sum <- rowSums(prop.table.trb2_filter.cd4.ln)

prop.table.trb2_filter.cd4.ln <- prop.table.trb2_filter.cd4.ln %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd4.ln <- prcomp(t(prop.table.trb2_filter.cd4.ln), scale = TRUE, center = T)

mdres.pca.trb.cd4.ln  <-  colnames(prop.table.trb_filter[,5:11])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  

fviz_pca_ind(res.pca.trb.cd4.ln,
             col.ind = as.factor(mdres.pca.trb.cd4.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd4.ln$Organ),
                  color = as.factor(mdres.pca.trb.cd4.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd4_trb_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd4_trb_ln.svg", width = 2.6, height = 1.7)

TRB CD8 Thymus

## Thymus
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd8.thy <- prop.table.trb_filter[,26:32]
prop.table.trb2_filter.cd8.thy$sum <- rowSums(prop.table.trb2_filter.cd8.thy)
## Error in rowSums(prop.table.trb2_filter.cd8.thy): 'x' must be numeric
prop.table.trb2_filter.cd8.thy <- prop.table.trb2_filter.cd8.thy %>% filter(sum >0) %>% select(-sum)
## Error in `filter()`:
## ! Problem while computing `..1 = sum > 0`.
## Caused by error in `sum > 0`:
## ! comparison (6) is possible only for atomic and list types
res.pca.trb.cd8.thy <- prcomp(t(prop.table.trb2_filter.cd8.thy), scale = TRUE, center = T)
## Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
mdres.pca.trb.cd8.thy  <-  colnames(prop.table.trb_filter[,26:32])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
 
fviz_pca_ind(res.pca.trb.cd8.thy,
             col.ind = as.factor(mdres.pca.trb.cd8.thy$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd8.thy$Organ),
                  color = as.factor(mdres.pca.trb.cd8.thy$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 
## Error in .get_facto_class(X): object 'res.pca.trb.cd8.thy' not found
#ggsave("final_fig/pca/cd8_trb_thy.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_trb_thy.svg", width = 2.6, height = 1.7)

TRB CD8 Lymph nodes

## LN
prop.table.trb_filter %>% colnames
##  [1] "CD4 Lymph nodes CA Exp01"   "CD4 Lymph nodes CA Exp02"  
##  [3] "CD4 Lymph nodes CA Exp03"   "CD4 Lymph nodes CAKR Exp02"
##  [5] "CD4 Lymph nodes CAKR Exp03" "CD4 Lymph nodes WT Exp02"  
##  [7] "CD4 Lymph nodes WT Exp03"   "CD4 Thymus CA Exp01"       
##  [9] "CD4 Thymus CA Exp03"        "CD4 Thymus CAKR Exp01"     
## [11] "CD4 Thymus CAKR Exp02"      "CD4 Thymus CAKR Exp03"     
## [13] "CD4 Thymus WT Exp02"        "CD4 Thymus WT Exp03"       
## [15] "CD8 Lymph nodes CA Exp02"   "CD8 Lymph nodes CA Exp03"  
## [17] "CD8 Lymph nodes CAKR Exp02" "CD8 Lymph nodes CAKR Exp03"
## [19] "CD8 Lymph nodes WT Exp01"   "CD8 Lymph nodes WT Exp02"  
## [21] "CD8 Lymph nodes WT Exp03"   "CD8 Thymus CA Exp01"       
## [23] "CD8 Thymus CA Exp03"        "CD8 Thymus CAKR Exp02"     
## [25] "CD8 Thymus CAKR Exp03"      "CD8 Thymus WT Exp01"       
## [27] "CD8 Thymus WT Exp02"        "CD8 Thymus WT Exp03"       
## [29] "aaSeqCDR3"                  "allVHitsWithScore"         
## [31] "allDHitsWithScore"          "allJHitsWithScore"
prop.table.trb2_filter.cd8.ln <- prop.table.trb_filter[,19:25]
prop.table.trb2_filter.cd8.ln$sum <- rowSums(prop.table.trb2_filter.cd8.ln)

prop.table.trb2_filter.cd8.ln <- prop.table.trb2_filter.cd8.ln %>% filter(sum >0) %>% select(-sum)

res.pca.trb.cd8.ln <- prcomp(t(prop.table.trb2_filter.cd8.ln), scale = TRUE, center = T)

mdres.pca.trb.cd8.ln  <-  colnames(prop.table.trb_filter[,19:25])  %>% 
  as.data.frame() %>% 
  mutate(sample = stringr::str_replace_all(., pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp"))
  
fviz_pca_ind(res.pca.trb.cd8.ln,
             col.ind = as.factor(mdres.pca.trb.cd8.ln$Strain), # color by groups
             legend.title = "Groups",
             geom.ind = "point",
             invisible="quali", pointsize = 0) +
   scale_color_manual(values=c("dodgerblue","indianred2","gray40")) +
   geom_point(aes(shape = as.factor(mdres.pca.trb.cd8.ln$Organ),
                  color = as.factor(mdres.pca.trb.cd8.ln$Strain))) +
  geom_hline(yintercept = 0, color = "grey20") +
  geom_vline(xintercept = 0, color = "grey20") 

#ggsave("final_fig/pca/cd8_trb_ln.png", width = 2.6, height = 1.7)
#ggsave("final_fig/pca/cd8_trb_ln.svg", width = 2.6, height = 1.7)

Diversity index

To compare the diversity of the TCR repertoires of our samples, we will calculate the Chao1 index using the Immunarch package. Chao1 is a nonparameteric asymptotic estimator of species richness. The higher the Chao1 value, the more diverse repertoire in the sample.

We will calculate the diversity index of TCRa chains without iNKT chains, so we will first exclude them.

# Read metadata
md <- read.csv("metadata_Lck.csv")

# Create filtered dataset
immdata_tra_wonkt <- repFilter(immdata_tra, "by.clonotype",
  list(V.name = exclude("TRAV11"), J.name = exclude("TRAJ18")),
  .match = "substring")
## ALL TRA
repDiversity(immdata_tra_wonkt$data) %>% vis(.by = c( "Mouse_strain"), .meta = immdata_tra$meta, 
  .signif.label.size = 0) + ggtitle("TRA without NKT") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) + ggtitle("all TRA", subtitle = "")

# source data
write.csv(repDiversity(immdata_tra_wonkt$data), "rep_diversity_tra.csv")

#ggsave("./final_fig/diversity/chao_tra_wonkt_sign.png", width = 8, height = 13, units = "cm")
#ggsave("./final_fig/diversity/chao_tra_wonkt_sign.svg", width = 8, height = 13, units = "cm")

The TCRb repertoire doesn’t need any filtering, so we can use it as it is.

## ALL TRB
repDiversity(immdata_trb$data) %>% vis(.by = c( "Mouse_strain"), .meta = immdata_trb$meta, 
  .signif.label.size = 0) + ggtitle("TRB without NKT") + scale_fill_manual(values = c("dodgerblue","indianred2","gray70")) + ggtitle("all TRB", subtitle = "")

# source data
write.csv(repDiversity(immdata_trb$data), "rep_diversity_trb.csv")

#ggsave("./final_fig/diversity/chao_trb_sign.png", width = 8, height = 13, units = "cm")
#ggsave("./final_fig/diversity/chao_trb_sign.svg", width = 8, height = 13, units = "cm")

Next, we divide samples by tissue and by cell type (CD4 versus CD8).

## Chao index
repDiversity(immdata_tra_wonkt$data) %>% vis(.by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_tra$meta, .test = F,
  .signif.label.size = 0) + ggtitle("TRA")

#ggsave("./plots/chao_tra.png", width = 15, height = 11, units = "cm")
#ggsave("./plots/chao_tra.svg", width = 15, height = 11, units = "cm")

repDiversity(immdata_trb$data) %>% vis(.by = c("Cell_type", "Organ","Mouse_strain"), .meta = immdata_trb$meta, .test = F,
  .signif.label.size = 0) + ggtitle("TRB", subtitle = "")

#ggsave("./plots/chao_trb.png", width = 15, height = 11, units = "cm")
#ggsave("./plots/chao_trb.svg", width = 15, height = 11, units = "cm")
chao_tra <- as.data.frame(repDiversity(immdata_tra_wonkt$data))
chao_trb <- as.data.frame(repDiversity(immdata_trb$data))
chao_tra2 <- chao_tra %>% 
  rownames_to_column("num_id") %>% 
mutate(num_id = str_replace_all(string = num_id, c("new_lib_24" = "lib24_S24_L001",  "new_lib_25" = "lib25_S25_L001"))) %>% 
  separate(num_id, into = c("num_id",NA,NA), sep = "_") %>% 
  mutate(num_id = str_replace(num_id, "lib","")) %>% 
  mutate_at(vars("num_id"), as.numeric) 
  

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")

# Attach metadata sent for sequencing

chao_tra2 <- left_join(chao_tra2, md) %>% select(value = Estimator, Cell_type, Organ, Mouse_strain) %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

chao_tra2 %>% filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD4 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/diversity/chao_cd4_tra.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd4_tra.eps", width = 8, height = 10, units = "cm")

levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")

chao_tra2 %>% filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD8 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/diversity/chao_cd8_tra.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd8_tra.eps", width = 8, height = 10, units = "cm")
chao_trb2 <- chao_trb %>% 
  rownames_to_column("num_id") %>% mutate(num_id = str_replace_all(string = num_id, c("new_lib_24" = "lib24_S24_L001",  "new_lib_25" = "lib25_S25_L001"))) %>% 
  separate(num_id, into = c("num_id",NA,NA), sep = "_") %>% 
  mutate(num_id = str_replace(num_id, "lib",""))  %>% 
  mutate_at(vars("num_id"), as.numeric) 
  

levels_cd4 <- c("CD4 Thymus WT", "CD4 Thymus CA", "CD4 Thymus CAKR", "CD4 Lymph nodes WT", "CD4 Lymph nodes CA", "CD4 Lymph nodes CAKR")

chao_trb2 <- left_join(chao_trb2, md) %>% select(value = Estimator, Cell_type, Organ, Mouse_strain) %>%
   mutate(sample_type = paste(Cell_type, Organ, Mouse_strain)) 

chao_trb2 %>% filter(Cell_type == "CD4") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd4))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD4 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,125500)) + xlab("")

#ggsave("./final_fig/diversity/chao_cd4_trb.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd4_trb.eps", width = 8, height = 10, units = "cm")

levels_cd8 <- c("CD8 Thymus WT", "CD8 Thymus CA", "CD8 Thymus CAKR", "CD8 Lymph nodes WT", "CD8 Lymph nodes CA", "CD8 Lymph nodes CAKR")


chao_trb2 %>% filter(Cell_type == "CD8") %>% 
  ggplot(aes(y = value, x = factor(sample_type, levels = levels_cd8))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Mouse_strain)) + 
  ylab("Chao index estimate") + 
  ggtitle("CD8 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,125500)) + xlab("")

#ggsave("./final_fig/diversity/chao_cd8_trb.png", width = 8, height = 10, units = "cm")
#ggsave("./final_fig/diversity/chao_cd8_trb.eps", width = 8, height = 10, units = "cm")
chao_trb2 %>% 
  ggplot(aes(y = value, x = factor(Mouse_strain, levels = c("WT","CA","CAKR")))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.1), size = 2, aes(color = Mouse_strain)) +
ylab("Chao index estimate") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/diversity/chao_all_trb.png", width = 8, height = 6, units = "cm")
#ggsave("./final_fig/diversity/chao_all_trb.eps", width = 8, height = 6, units = "cm")


chao_tra2 %>% 
  ggplot(aes(y = value, x = factor(Mouse_strain, levels = c("WT","CA","CAKR")))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.1), size = 2, aes(color = Mouse_strain)) +
ylab("Chao index estimate") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

#ggsave("./final_fig/diversity/chao_all_tra.png", width = 8, height = 6, units = "cm")
#ggsave("./final_fig/diversity/chao_all_tra.eps", width = 8, height = 6, units = "cm")

Top20 WT LN sequences

In the next analyses, we will figure out whether the sequences that are abundant in the periphery of WT animals are also present in LckKO animals. We will select the top 20 most abundant sequences from the lymph nodes of WT animals by calculating the average normalized frequency:

## TRA
topclones <- prop.table.tra_filter %>% 
  mutate(topclones_ln_cd4 = (`CD4 Lymph nodes WT Exp02`+
                               `CD4 Lymph nodes WT Exp03`)/2,
         topclones_ln_cd8 = (`CD8 Lymph nodes WT Exp01`+
                               `CD8 Lymph nodes WT Exp02`+
                               `CD8 Lymph nodes WT Exp03`)/3)

cd4_topclones_20sequences_tra <-
  pull(topclones %>% slice_max(order_by = topclones_ln_cd4, n = 20),
       aaSeqCDR3)
cd8_topclones_20sequences_tra <-
  pull(topclones %>% slice_max(order_by = topclones_ln_cd8, n = 20),
       aaSeqCDR3)

## TRB
topclones <- prop.table.trb_filter %>% 
  mutate(topclones_ln_cd4 = (`CD4 Lymph nodes WT Exp02`+
                               `CD4 Lymph nodes WT Exp03`)/2,
         topclones_ln_cd8 = (`CD8 Lymph nodes WT Exp01`+
                               `CD8 Lymph nodes WT Exp02`+
                               `CD8 Lymph nodes WT Exp03`)/3)

cd4_topclones_20sequences_trb <- pull(topclones %>% 
                                        slice_max(order_by = topclones_ln_cd4, n = 20), aaSeqCDR3)
cd8_topclones_20sequences_trb <- pull(topclones %>% 
                                        slice_max(order_by = topclones_ln_cd8, n = 20), aaSeqCDR3)

Top 20 most abundant TRA sequences in CD4 WT LNs:

cd4_topclones_20sequences_tra
##  [1] "CAASANSGTYQRF"  "CAASASSGSWQLIF" "CAAASSGSWQLIF"  "CAASDYSNNRLTL" 
##  [5] "CAASDTNTGKLTF"  "CAASRNSNNRIFF"  "CAASDQGGRALIF"  "CAASNTNTGKLTF" 
##  [9] "CAASPNTNKVVF"   "CAASSNTNKVVF"   "CAAGTGGYKVVF"   "CAASITGNTGKLIF"
## [13] "CAASDTNAYKVIF"  "CAASSSGSWQLIF"  "CAATGNTGKLIF"   "CAASGGSNAKLTF" 
## [17] "CAASGTGGYKVVF"  "CAASSGSWQLIF"   "CAARSSGSWQLIF"  "CAARNSNNRIFF"

Top 20 most abundant TRB sequences in CD4 WT LNs:

cd4_topclones_20sequences_trb
##  [1] "CASSLGQNTEVFF"  "CASSGTANSDYTF"  "CASSLDSQNTLYF"  "CASSRDNYAEQFF" 
##  [5] "CASSQGQGSYEQYF" "CASSPTGVQDTQYF" "CAWSLGGQDTQYF"  "CASSRDWGYEQYF" 
##  [9] "CASSRDSYAEQFF"  "CASSRDSQNTLYF"  "CASSLGGQNTLYF"  "CASSLGSQNTLYF" 
## [13] "CASSLSQNTLYF"   "CASSLQGNTEVFF"  "CASGDEQYF"      "CASGDAGGQNTLYF"
## [17] "CASSGTANTEVFF"  "CASSQEGGGTEVFF" "CASSRQGNTEVFF"  "CASSPSSYEQYF"

Top 20 most abundant TRA sequences in CD8 WT LNs:

cd8_topclones_20sequences_tra
##  [1] "CAASASSGSWQLIF"  "CALSDRYNQGKLIF"  "CAASNMGYKLTF"    "CAVSASSGSWQLIF" 
##  [5] "CAVDTNAYKVIF"    "CAAASSGSWQLIF"   "CAASSGSWQLIF"    "CAASDNYAQGLTF"  
##  [9] "CAASDTNAYKVIF"   "CAADTNAYKVIF"    "CAVSNMGYKLTF"    "CAASGTGGYKVVF"  
## [13] "CAAGATGGNNKLTF"  "CALMGYKLTF"      "CAASANSGTYQRF"   "CALGDTNAYKVIF"  
## [17] "CAASDDTNAYKVIF"  "CAASDMGYKLTF"    "CAASAGTGGYKVVF"  "CAMREGSSGSWQLIF"

Top 20 most abundant TRB sequences in CD8 WT LNs:

cd8_topclones_20sequences_trb
##  [1] "CASSDAGYEQYF"    "CASSPGTGGYEQYF"  "CASGDAGEQYF"     "CASGDAGGSAETLYF"
##  [5] "CASSDSAETLYF"    "CASSRDNYAEQFF"   "CASSPGQQDTQYF"   "CASSLGYEQYF"    
##  [9] "CASSLGAEQFF"     "CASSPGQYAEQFF"   "CASSLGGNYAEQFF"  "CASSDWGNYAEQFF" 
## [13] "CASSLGANSDYTF"   "CASSPGQNYAEQFF"  "CASSLGGQDTQYF"   "CASGDAGYEQYF"   
## [17] "CASSLGNYAEQFF"   "CASSRDNYEQYF"    "CASSDSYEQYF"     "CASSLDNYAEQFF"

Summary plots - percentage of the repertoire

Now we will investigate what is the percentage of the repertoire of each sample that is occupied by the top20 WT LN sequences that we identified earlier. We will plot the results.

CD4 TRA

cd4_topclones_rep_pct <- prop.table.tra2 %>% 
  filter(aaSeqCDR3 %in% cd4_topclones_20sequences_tra) %>% 
  select(aaSeqCDR3, starts_with("CD4")) %>% 
  select(1,14,15,9:13,7,8,2:6) 

cd4_topclones_rep_pct2 <- as.data.frame(colSums(cd4_topclones_rep_pct[,2:15]))
colnames(cd4_topclones_rep_pct2) <- "value"

cd4_topclones_rep_pct2 <- cd4_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), 
                                  levels = c("Thymus WT", "Thymus CA", "Thymus CAKR",
                                             "LN WT", "LN CA", "LN CAKR")))

cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD4 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

# source data
write.csv(cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>%
  mutate(value = value*100), 
  file = "topclones_cd4_tra.csv")

#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd4.png", width = 5, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd4.eps", width = 5, height = 5.7, units = "cm")

CD4 TRB

cd4_topclones_rep_pct <- prop.table.trb2 %>% 
  filter(aaSeqCDR3 %in% cd4_topclones_20sequences_trb) %>% 
  select(aaSeqCDR3, starts_with("CD4")) %>% 
  select(1,14,15,9:13,7,8,2:6) 

cd4_topclones_rep_pct2 <- as.data.frame(colSums(cd4_topclones_rep_pct[,2:15]))
colnames(cd4_topclones_rep_pct2) <- "value"

cd4_topclones_rep_pct2 <- cd4_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD4 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

# source data
write.csv(cd4_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>%
  mutate(value = value*100), 
  file = "topclones_cd4_trb.csv")

#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd4.png", width = 5.4, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd4.eps", width = 5.4, height = 5.7, units = "cm")

CD8 TRA

cd8_topclones_rep_pct <- prop.table.tra2 %>% 
  filter(aaSeqCDR3 %in% cd8_topclones_20sequences_tra) %>% 
  select(aaSeqCDR3, starts_with("CD8")) %>% 
  select(1,13:15,9:12,6:8,2:5) 

cd8_topclones_rep_pct2 <- as.data.frame(colSums(cd8_topclones_rep_pct[,2:15]))
colnames(cd8_topclones_rep_pct2) <- "value"

cd8_topclones_rep_pct2 <- cd8_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd8_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp      value Cell_type_organ
## 1        CD8 Thymus     WT Exp01 0.03400861       Thymus WT
## 2        CD8 Thymus     WT Exp02 0.02653846       Thymus WT
## 3        CD8 Thymus     WT Exp03 0.03137384       Thymus WT
## 4        CD8 Thymus     CA Exp01 0.02447936       Thymus CA
## 5        CD8 Thymus     CA Exp03 0.02870174       Thymus CA
## 6        CD8 Thymus   CAKR Exp02 0.03063281     Thymus CAKR
## 7        CD8 Thymus   CAKR Exp03 0.02280437     Thymus CAKR
## 8        CD8     LN     WT Exp01 0.03753115           LN WT
## 9        CD8     LN     WT Exp02 0.03764212           LN WT
## 10       CD8     LN     WT Exp03 0.03642344           LN WT
## 11       CD8     LN     CA Exp02 0.03592586           LN CA
## 12       CD8     LN     CA Exp03 0.03213014           LN CA
## 13       CD8     LN   CAKR Exp02 0.02946860         LN CAKR
## 14       CD8     LN   CAKR Exp03 0.03059997         LN CAKR
cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD8 TRA") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

# source data
write.csv(cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>%
  mutate(value = value*100), 
  file = "topclones_cd8_tra.csv")

#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd8.png", width = 5, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_tra_cd8.eps", width = 5, height = 5.7, units = "cm")

CD8 TRB

cd8_topclones_rep_pct <- prop.table.trb2 %>% 
  filter(aaSeqCDR3 %in% cd8_topclones_20sequences_trb) %>% 
  select(aaSeqCDR3, starts_with("CD8")) %>% 
  select(1,13:15,9:12,6:8,2:5) 

cd8_topclones_rep_pct2 <- as.data.frame(colSums(cd8_topclones_rep_pct[,2:15]))
colnames(cd8_topclones_rep_pct2) <- "value"

cd8_topclones_rep_pct2 <- cd8_topclones_rep_pct2 %>% 
  rownames_to_column("sample") %>% 
  mutate(sample = stringr::str_replace_all(sample, pattern = "Lymph nodes", replacement = "LN")) %>% 
  separate(sample, into = c("Cell_type","Organ","Strain","Exp")) %>% 
  mutate(Cell_type_organ = factor(paste(Organ, Strain), levels = c("Thymus WT","Thymus CA", "Thymus CAKR",
                                                                   "LN WT","LN CA", "LN CAKR" )))

cd8_topclones_rep_pct2
##    Cell_type  Organ Strain   Exp       value Cell_type_organ
## 1        CD8 Thymus     WT Exp01 0.004631440       Thymus WT
## 2        CD8 Thymus     WT Exp02 0.004479484       Thymus WT
## 3        CD8 Thymus     WT Exp03 0.004929262       Thymus WT
## 4        CD8 Thymus     CA Exp01 0.005472183       Thymus CA
## 5        CD8 Thymus     CA Exp03 0.004517467       Thymus CA
## 6        CD8 Thymus   CAKR Exp02 0.003983029     Thymus CAKR
## 7        CD8 Thymus   CAKR Exp03 0.005094131     Thymus CAKR
## 8        CD8     LN     WT Exp01 0.007470700           LN WT
## 9        CD8     LN     WT Exp02 0.006188663           LN WT
## 10       CD8     LN     WT Exp03 0.007226656           LN WT
## 11       CD8     LN     CA Exp02 0.006775749           LN CA
## 12       CD8     LN     CA Exp03 0.007135684           LN CA
## 13       CD8     LN   CAKR Exp02 0.007553967         LN CAKR
## 14       CD8     LN   CAKR Exp03 0.006905450         LN CAKR
cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>% 
  ggplot(aes(y = value*100, x = factor(Cell_type_organ))) +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  geom_point(aes(color = Strain)) + 
  ylab("% of repertoire") + 
  ggtitle("CD8 TRB") + 
  theme(axis.text.x = element_text(angle = 90)) + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,NA)) + xlab("")

# source data
write.csv(cd8_topclones_rep_pct2 %>% 
  filter(Organ == "LN") %>%
  mutate(value = value*100), 
  file = "topclones_cd8_trb.csv")

#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd8.png", width = 5.4, height = 5.7, units = "cm")
#ggsave("./final_fig/20topseqs/20topseqs_pct_trb_cd8.eps", width = 5.4, height = 5.7, units = "cm")

Heatmaps

In this part, we will use heatmaps to visualize the frequency of the sequences that were the most abundant (top20) in the periphery of WT mice in all samples. In the supplementary figure in the manuscript, reference samples (periphery of WT mice) are highlighted with black border, which was added manually in a graphics editor.

TRA

CD4

Heatmap showing the frequency of the top20 CD4 TRA sequences from WT LNs in all samples.

# TRA
cd4_topclones_heatmap_matrix <- prop.table.tra2 %>% 
  filter(aaSeqCDR3 %in% cd4_topclones_20sequences_tra) %>% 
  select(aaSeqCDR3, starts_with("CD4")) %>% 
  select(1,14,15,9:13,7,8,2:6)

cd4_topclones_heatmap_matrix2 <- as.matrix(cd4_topclones_heatmap_matrix[,2:15])
rownames(cd4_topclones_heatmap_matrix2) <- cd4_topclones_heatmap_matrix$aaSeqCDR3

cd4_topclones_heatmap_matrix2 <- cd4_topclones_heatmap_matrix2[match(cd4_topclones_20sequences_tra, rownames(cd4_topclones_heatmap_matrix2)),]

pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd4_tra_noscale.svg")

CD8

Heatmap showing the frequency of the top20 CD8 TRA sequences from WT LNs in all samples.

# TRA

cd8_topclones_heatmap_matrix <- prop.table.tra2 %>% 
  filter(aaSeqCDR3 %in% cd8_topclones_20sequences_tra) %>% 
  select(aaSeqCDR3, starts_with("CD8")) %>% 
  select(1,13:15,9:12,6:8,2:5)
  

cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match(cd8_topclones_20sequences_tra, rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_tra_noscale.svg")

As we can see that the first sequence takes up the most percentage of the repertoire, we will also plot another heatmap without the first sequence.

### without 1st sequence

cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[2:20,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3[2:20]

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match(cd8_topclones_20sequences_tra[2:20], rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_tra_noscale_without.svg")

TRB

CD4

Heatmap showing the frequency of the top20 CD4 TRB sequences from WT LNs in all samples.

# trb

cd4_topclones_heatmap_matrix <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd4_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD4")) %>% select(1,14,15,9:13,7,8,2:6)
  
cd4_topclones_heatmap_matrix2 <- as.matrix(cd4_topclones_heatmap_matrix[,2:15])
rownames(cd4_topclones_heatmap_matrix2) <- cd4_topclones_heatmap_matrix$aaSeqCDR3

cd4_topclones_heatmap_matrix2 <- cd4_topclones_heatmap_matrix2[match( cd4_topclones_20sequences_trb, rownames(cd4_topclones_heatmap_matrix2)),]

pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

#ggsave(plot = pheatmap(cd4_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd4_trb_noscale.svg")

CD8

Heatmap showing the frequency of the top20 CD8 TRB sequences from WT LNs in all samples.

# trb

cd8_topclones_heatmap_matrix <- prop.table.trb2 %>% filter(aaSeqCDR3 %in% cd8_topclones_20sequences_trb) %>% select(aaSeqCDR3, starts_with("CD8")) %>% select(1,13:15,9:12,6:8,2:5)
  
cd8_topclones_heatmap_matrix2 <- as.matrix(cd8_topclones_heatmap_matrix[,2:15])
rownames(cd8_topclones_heatmap_matrix2) <- cd8_topclones_heatmap_matrix$aaSeqCDR3

cd8_topclones_heatmap_matrix2 <- cd8_topclones_heatmap_matrix2[match( cd8_topclones_20sequences_trb, rownames(cd8_topclones_heatmap_matrix2)),]

pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)

# png("final_fig/heatmap_top20_ordered/cd8_trb_noscale.png", width = 400, height = 500)
# pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F)
# dev.off()

#ggsave(plot = pheatmap(cd8_topclones_heatmap_matrix2, scale = "none", cluster_rows = F, cluster_cols = F), "final_fig/heatmap_top20_ordered/cd8_trb_noscale.svg")

Repertoire overlap

binary_tra <- excel_count_table_tra %>% 
  mutate(nkt_trav11_traj18 = if_else(
    (grepl(allVHitsWithScore, pattern = "TRAV11") | 
    (grepl(allJHitsWithScore, pattern = "TRAJ18"))),"yes","no")) %>% 
  filter(nkt_trav11_traj18 == "no") %>% 
  mutate_at(vars(starts_with("CD")), .funs = binary) 

# count_overlap <- function(df, column_number){
#   repertoire <- df %>% select(vars(column_number), aaSeqCDR3) %>% filter(vars(column_number)>0) %>% pull("aaSeqCDR3")
#   nrow_column <- length(repertoire)
#   intersect_rept <- length(intersect(repertoire, most_diverse_repertoire))
#   pct <- intersect_rept/length()
#   return(x)
# }

binary_tra_longer <- binary_tra %>% select(starts_with("CD"), aaSeqCDR3) %>% pivot_longer(!aaSeqCDR3, names_to = "num_id")

df_all4 <- data.frame("")

for(j in levels(factor(binary_tra_longer$num_id))){
  subset1 <- binary_tra_longer %>% filter(num_id == j & value>0) %>% pull("aaSeqCDR3")
  vector_overlap <- c()
  for(i in levels(factor(binary_tra_longer$num_id))){
    subset2 <- binary_tra_longer %>% filter(num_id == i & value>0) %>% pull("aaSeqCDR3")
    intersect_rep <- length(intersect(subset1, subset2))
    total <- length(subset2)
    vector_overlap <- c(vector_overlap,intersect_rep/total)
  }
  df <- as.data.frame(x = vector_overlap)
  colnames(df) <- j
  df
  df_all4 <- cbind(df_all4, df)
}

df_all4 <- df_all4[,2:29]
rownames(df_all4) <- colnames(df_all4)

#write.csv(df_all4, "rep_overlap_tra.csv")
df24 <- df_all4
df24[df24 == 1] <- 0

### plot dotplot of overlaps

df25 <- df24 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df25, aes(id, factor(name, levels = rev(levels(factor(name)))))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(7,12)) +
  geom_text(aes(label = round(value*100, digits = 1))) + 
  scale_colour_gradient2(low = "lightskyblue", mid = "lightsteelblue2", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) 

Rozdelene na CD4 a CD8

rep_overlap_tra <- read_csv("rep_overlap_tra.csv") %>% column_to_rownames("...1")

rep_overlap_tra_cd4 <- rep_overlap_tra[1:14,1:14]
rep_overlap_tra_cd8 <- rep_overlap_tra[15:28,15:28]

rep_overlap_tra_cd4 <- rep_overlap_tra_cd4[ rev(c(5, 4, 3, 2, 1, 7, 6, 12, 11, 10, 9, 8, 14, 13)),
                                            rev(c(5, 4, 3, 2, 1, 7, 6, 12, 11, 10, 9, 8, 14, 13))]

levels_names <- colnames(rep_overlap_tra_cd4)

df_overlap_tra_cd4 <- rep_overlap_tra_cd4
df_overlap_tra_cd4[df_overlap_tra_cd4 == 1] <- 0

### plot dotplot of overlaps

df_overlap_tra_cd4 <- df_overlap_tra_cd4 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_tra_cd4, aes(x = factor(name, levels = levels_names), y = factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd4_tra.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd4_tra.svg", width = 20, height = 16, units = "cm")
rep_overlap_tra_cd8 <- rep_overlap_tra_cd8[rev(c(12:14,8:11,5:7,1:4)),rev(c(12:14,8:11,5:7,1:4))]

levels_names <- rev(colnames(rep_overlap_tra_cd8))

df_overlap_tra_cd8 <- rep_overlap_tra_cd8
df_overlap_tra_cd8[df_overlap_tra_cd8 == 1] <- 0

### plot dotplot of overlaps

df_overlap_tra_cd8 <- df_overlap_tra_cd8 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_tra_cd8, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd8_tra.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd8_tra.svg", width = 20, height = 16, units = "cm")

A la violin

violin_tra_cd4 <- rep_overlap_tra_cd4 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)


violin_tra_cd8 <- rep_overlap_tra_cd8 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)
violin_tra_cd4 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRA Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_tra_cd4 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRA LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_tra_LN.eps", width = 8, height = 8,  units = "cm")
violin_tra_cd8 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRA Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_tra_cd8 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strain1 = paste(Organ1, Strain1), 
                          organ_strain2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strain1, organ_strain2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRA LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.35)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_LN.png", width = 8, height = 8, units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_tra_LN.eps", width = 8, height = 8, units = "cm")

TRB

Vsechny vzorky se vsema

binary_trb <- excel_count_table_trb %>% 
  mutate_at(vars(starts_with("CD")), .funs = binary) 

binary_trb_longer <- binary_trb %>% select(starts_with("CD"), aaSeqCDR3) %>% pivot_longer(!aaSeqCDR3, names_to = "num_id")

df_all4 <- data.frame("")

for(j in levels(factor(binary_trb_longer$num_id))){
  subset1 <- binary_trb_longer %>% filter(num_id == j & value>0) %>% pull("aaSeqCDR3")
  vector_overlap <- c()
  for(i in levels(factor(binary_trb_longer$num_id))){
    subset2 <- binary_trb_longer %>% filter(num_id == i & value>0) %>% pull("aaSeqCDR3")
    intersect_rep <- length(intersect(subset1, subset2))
    total <- length(subset2)
    vector_overlap <- c(vector_overlap,intersect_rep/total)
  }
  df <- as.data.frame(x = vector_overlap)
  colnames(df) <- j
  df
  df_all4 <- cbind(df_all4, df)
}

df_all4 <- df_all4[,2:29]
rownames(df_all4) <- colnames(df_all4)

#write.csv(df_all4, "rep_overlap_trb.csv")
df24 <- df_all4
df24[df24 == 1] <- 0

### plot dotplot of overlaps

df25 <- df24 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df25, aes(id, factor(name, levels = rev(levels(factor(name)))))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(7,12)) +
  geom_text(aes(label = round(value*100, digits = 1))) + 
  scale_colour_gradient2(low = "lightskyblue", mid = "lightsteelblue2", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) 

Rozdelene na CD4 a CD8

rep_overlap_trb <- read_csv("rep_overlap_trb.csv") %>% column_to_rownames("...1")

rep_overlap_trb_cd4 <- rep_overlap_trb[1:14,1:14]
rep_overlap_trb_cd8 <- rep_overlap_trb[15:28,15:28]

rep_overlap_trb_cd4 <- rep_overlap_trb_cd4[rev(c(13,14,8:12,6,7,1:5)),rev(c(13,14,8:12,6,7,1:5))]

levels_names <- rev(colnames(rep_overlap_trb_cd4))


df_overlap_trb_cd4 <- rep_overlap_trb_cd4
df_overlap_trb_cd4[df_overlap_trb_cd4 == 1] <- 0

### plot dotplot of overlaps

df_overlap_trb_cd4 <- df_overlap_trb_cd4 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_trb_cd4, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd4_trb.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd4_trb.svg", width = 20, height = 16, units = "cm")
rep_overlap_trb_cd8 <- rep_overlap_trb_cd8[rev(c(12:14,8:11,5:7,1:4)),rev(c(12:14,8:11,5:7,1:4))]

levels_names <- rev(colnames(rep_overlap_trb_cd8))

df_overlap_trb_cd8 <- rep_overlap_trb_cd8
df_overlap_trb_cd8[df_overlap_trb_cd8 == 1] <- 0

### plot dotplot of overlaps

df_overlap_trb_cd8 <- df_overlap_trb_cd8 %>% rownames_to_column("id") %>%  pivot_longer(-id)

g2 <- ggplot(df_overlap_trb_cd8, aes(factor(name, levels = levels_names), factor(id, levels = rev(levels_names)))) + 
  geom_point(aes(size = value*100, colour = value*100)) + 
  theme_bw() 

g2 + scale_size_continuous(range=c(5,9)) +
  geom_text(aes(label = round(value*100, digits = 1)), size = 3) + 
  scale_colour_gradient2(low = "white", mid = "lightsteelblue", high = "salmon") +
  theme(axis.text.x = element_text(angle = 90)) + ggtheme()

#ggsave(file = "./final_fig/overlap_cd8_trb.png", width = 20, height = 16, units = "cm")
#ggsave(file = "./final_fig/overlap_cd8_trb.svg", width = 20, height = 16, units = "cm")

A la violin

violin_trb_cd4 <- rep_overlap_trb_cd4 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)

violin_trb_cd8 <- rep_overlap_trb_cd8 %>% 
  rownames_to_column("sample1") %>% 
  pivot_longer(!sample1, names_to = "sample2") %>% 
  mutate(sample2 = stringr::str_replace_all(sample2, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  mutate(sample1 = stringr::str_replace_all(sample1, pattern = "Lymph nodes", replacement = "LN"))  %>% 
  separate(sample1, into = c("Cell_type1","Organ1","Strain1","Exp1"), remove = F)  %>% 
  separate(sample2, into = c("Cell_type2","Organ2","Strain2","Exp2"), remove = F) %>% 
  filter(value<1)
violin_trb_cd8 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRB Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.25)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_trb_cd8 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD8 TRB LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.25)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd8_trb_LN.eps", width = 8, height = 8,  units = "cm")
violin_trb_cd4 %>% filter(Organ1 == "Thymus" & Organ2 == "Thymus") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("Thymus WT - Thymus WT", "Thymus CA - Thymus WT", "Thymus CAKR - Thymus WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRB Thymus") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.20)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_thy.png", width = 9.5, height = 9.5,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_thy.eps", width = 9.5, height = 9.5,  units = "cm")
violin_trb_cd4 %>% filter(Organ1 == "LN" & Organ2 == "LN") %>% 
  mutate(organ_strbin1 = paste(Organ1, Strain1), 
                          organ_strbin2 = paste(Organ2, Strain2)) %>% 
  mutate(comparison = paste(organ_strbin1, organ_strbin2, sep = " - ")) %>% 
  filter(comparison %in% c("LN WT - LN WT", "LN CA - LN WT", "LN CAKR - LN WT")) %>% 
  ggplot(aes(x = comparison, y = value)) +
  geom_dotplot(binaxis='y', stackdir='center', dotsize=0) + 
  geom_jitter(position=position_jitter(0.2), size = 2, aes(color = comparison)) +
  ggtitle("CD4 TRB LN") +
  stat_summary(fun = "mean",
               geom = "crossbar", 
               width = 0.5,
               colour = "black") +
  ylab("% repertoire overlap") + 
  theme_classic() + 
  ggtheme() + 
  theme(axis.text.x = element_text(angle = 90)) + 
  scale_color_manual(values = c("dodgerblue","indianred2","gray40")) + 
  ylim(c(0,0.20)) + xlab("")

#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_LN.png", width = 8, height = 8,  units = "cm")
#ggsave("./final_fig/overlaps/rep_overlap_cd4_trb_LN.eps", width = 8, height = 8,  units = "cm")

SessionInfo

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] SmartEDA_0.3.8    kableExtra_1.3.4  factoextra_1.0.7  pheatmap_1.0.12  
##  [5] immunarch_0.8.0   data.table_1.14.2 dtplyr_1.2.2      patchwork_1.1.2  
##  [9] cowplot_1.1.1     readxl_1.4.1      forcats_0.5.2     stringr_1.4.1    
## [13] dplyr_1.0.10      purrr_0.3.4       readr_2.1.2       tidyr_1.2.1      
## [17] tibble_3.1.8      ggplot2_3.3.6     tidyverse_1.3.2  
## 
## loaded via a namespace (and not attached):
##   [1] uuid_1.1-0          backports_1.4.1     circlize_0.4.15    
##   [4] fastmatch_1.1-3     systemfonts_1.0.4   plyr_1.8.7         
##   [7] igraph_1.3.5        digest_0.6.29       foreach_1.5.2      
##  [10] htmltools_0.5.3     viridis_0.6.2       ggalluvial_0.12.3  
##  [13] fansi_1.0.3         magrittr_2.0.3      googlesheets4_1.0.1
##  [16] cluster_2.1.4       doParallel_1.0.17   tzdb_0.3.0         
##  [19] graphlayouts_0.8.1  modelr_0.1.9        vroom_1.5.7        
##  [22] svglite_2.1.0       lpSolve_5.6.15      rmdformats_1.0.4   
##  [25] colorspace_2.0-3    rvest_1.0.3         ggrepel_0.9.1      
##  [28] haven_2.5.1         xfun_0.33           crayon_1.5.2       
##  [31] jsonlite_1.8.1      phangorn_2.10.0     iterators_1.0.14   
##  [34] ape_5.6-2           glue_1.6.2          polyclip_1.10-0    
##  [37] gtable_0.3.1        gargle_1.2.1        webshot_0.5.3      
##  [40] UpSetR_1.4.0        car_3.1-0           kernlab_0.9-31     
##  [43] shape_1.4.6         ISLR_1.4            prabclus_2.3-2     
##  [46] DEoptimR_1.0-11     abind_1.4-5         scales_1.2.1       
##  [49] GGally_2.1.2        DBI_1.1.3           rstatix_0.7.0      
##  [52] Rcpp_1.0.9          viridisLite_0.4.1   xtable_1.8-4       
##  [55] bit_4.0.4           mclust_5.4.10       stats4_4.2.1       
##  [58] sampling_2.9        httr_1.4.4          RColorBrewer_1.1-3 
##  [61] fpc_2.2-9           modeltools_0.2-23   ellipsis_0.3.2     
##  [64] reshape_0.8.9       pkgconfig_2.0.3     flexmix_2.3-18     
##  [67] farver_2.1.1        nnet_7.3-17         sass_0.4.2         
##  [70] ggseqlogo_0.1       dbplyr_2.2.1        utf8_1.2.2         
##  [73] labeling_0.4.2      tidyselect_1.1.2    rlang_1.0.6        
##  [76] reshape2_1.4.4      later_1.3.0         munsell_0.5.0      
##  [79] cellranger_1.1.0    tools_4.2.1         cachem_1.0.6       
##  [82] cli_3.4.1           generics_0.1.3      broom_1.0.1        
##  [85] evaluate_0.16       fastmap_1.1.0       yaml_2.3.5         
##  [88] bit64_4.0.5         knitr_1.40          fs_1.5.2           
##  [91] tidygraph_1.2.2     robustbase_0.95-0   ggraph_2.0.6       
##  [94] nlme_3.1-159        mime_0.12           xml2_1.3.3         
##  [97] compiler_4.2.1      shinythemes_1.2.0   rstudioapi_0.14    
## [100] ggsignif_0.6.3      reprex_2.0.2        tweenr_2.0.2       
## [103] bslib_0.4.0         stringi_1.7.8       highr_0.9          
## [106] lattice_0.20-45     Matrix_1.5-1        vctrs_0.4.2        
## [109] stringdist_0.9.8    pillar_1.8.1        lifecycle_1.0.2    
## [112] jquerylib_0.1.4     GlobalOptions_0.1.2 httpuv_1.6.6       
## [115] R6_2.5.1            bookdown_0.28       promises_1.2.0.1   
## [118] gridExtra_2.3       codetools_0.2-18    MASS_7.3-58.1      
## [121] assertthat_0.2.1    withr_2.5.0         rlist_0.4.6.2      
## [124] diptest_0.76-0      parallel_4.2.1      hms_1.1.2          
## [127] quadprog_1.5-8      grid_4.2.1          class_7.3-20       
## [130] rmarkdown_2.16      carData_3.0-5       googledrive_2.0.0  
## [133] ggpubr_0.4.0        ggforce_0.4.0       shiny_1.7.2        
## [136] lubridate_1.8.0